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Abstract 

0O 

This paper shows the necessity of distinguishing different referential uses of noun 
phrases in machine translation. We argue that differentiating between the generic, refer- 
ential and ascriptive uses of noun phrases is the minimum necessary to generate articles 
\ and number correctly when translating from Japanese to English. Heuristics for de- 

Q\ • termining these differences are proposed for a Japanese-to-English machine translation 

system. Finally the results of using the proposed heuristics are shown to have raised 
the percentage of noun phrases generated with correct use of articles and number in the 
Japanese-to- English machine translation system ALT-J/E from 65% to 77%. 
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1 Introduction 



Determining the referential property of noun phrases is essential not only to understanding 
a text, but also to decide how to generate it in English. This paper proposes a heuristic 
algorithm to determine the referential properties of noun phrases in a Japanese text. The 
original motivation of the research was to improve the quality of English output by NTT 
Communication Science Laboratories' Japanese to English machine translation system ALT- 
J/E ( [Ikeriara et al. 1991 ; pgura et al. 1993 ). We expect, however, that the results will also 



be useful for text extraction and general text understanding. 

In this paper we use the term noun phrase reference to describe the relation between 
a noun phrase and what it stands for when it is used. We distinguish between three uses of 
noun phrases, two referential and one non-referential. A noun phrase can be used to refer 
in two different ways: generic where a noun phrase is used to refer to a whole class, and 
referential where a noun phrase refers to a particular entity or entities. A third use is 
ascriptive where a noun phrase is used not to refer to anything but rather, normally with a 
copular verb, to ascribe a property to some referent. Although ascriptive noun phrases are 
non-referring, we will refer to all three uses under the general term of noun phrase reference. 



*Now at Doshisha University, Kyoto, Japan: <kawaoka@wise . doshisha. ac . jp>. 

^This paper was presented at the Sixth International Conference on Theoretical and Methodological Issues 
in Machine Translation (TMI '95) and appears in the proceedings: pp 1-14. 
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This three-way distinction of noun phrase reference was introduced in Bond et al. (1994 
and used as a base to determine the countability and number of noun phrases in Japanese 
to-English machine translation. In this paper we define exactly what is meant by the three 
kinds of reference and show how the distinction is essential in the generation of articles. 

This paper is structured as follows. First, we define the three kinds of referentiality which 
we distinguish and justify the definitions on theoretical and practical grounds, comparing them 
with those suggested by other researchers. We then describe in detail a heuristic method for 
determining noun phrase reference in Japanese sentences. Next, we show how the distinction 
is used in a Japanese to English machine translation system to generate articles and number. 
Finally, we look at experimental results gained by implementing the proposed methods and 
compare them to those achieved by an earlier version of the same system, and by other 
systems. 

2 Definition of noun phrase reference 

Noun phrase reference is of fundamental importance in any discussion of meaning ( |Lyons 



1977). In English, it is also important in determining how articles should be used. In this 
section we give a more detailed definition of the three kinds of noun phrase reference under 
discussion and compare them with the definitions used in other machine translation systems. 

Generic: Noun phrases with generic reference denote an entire class: e.g. mammoths in 
Mammoths are extinct. In English generic noun phrases can normally be expressed in 



three ways, as discussed in Section 4.1 



Referential: Referential noun phrases are those that refer to some entity or entities in the 
discourse world: e.g. a mammoth in There is a mammoth in my garden! Referential 
noun phrases are plural if there is more than one discrete referent, and are marked for 
definiteness. 

Ascriptive: Ascriptive noun phrases are used with a copular verb, or in an appositive ex- 
pression, to ascribe a property to their subject: e.g. a mammoth in That animal is 
a mammoth . Because ascriptive noun phrases are non-referring they cannot be the 
antecedent of other noun phrases. 

Zelinsky-Wibbelt (1992) distinguishes between generic and identifying, which appear 
to be equivalent to our generic and referential. |Zelinsky-Wibbelt 's examples do contain 



ascriptive noun phrases, for example a human being in 'A spectator is a human being', instead 
they appear to be treated as adjective phrases in the rules (for example in their rule 14 
(p. 797 op cit) where the complement of the copulative predicate with a generic subject is 
an evaluative adjective phrase). If the definition of adjective phrase has been expanded to 
include ascriptive noun phrases]] then our analysis is compatible. Unfortunately there is 



no discussion in Zelinsky-Wibbelt as to how effective their rules are when actually used in a 



machine translation system so we cannot make a quantitative comparison. 

Murata (1993) distinguish between generic and non-generic, which is further divided 
into definite and indefinite, using heuristics similar to rewriting rules in expert sys- 
tems. They make no distinction between referential and ascriptive for non-generic noun 

1 We feel this expanded definition is plausible, since the copula and ascriptive noun phrase combination 
fulfills the same semantic role as the copula and adjective phrase, that is, to ascribe a property. 
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phrases. This leaves open the possibility for conflict with their rule that a noun phrase will be 
definite if it has been presented previously. Consider the following sentence^: zd-wa honyurui 
da-si, manmosu-mo honyurui da. 'Elephant-TOP mammal be-and mammoth-ALSO mammal 
be.' Elephants are mammals and mammoths are also mammals. This will become Elephants 
are mammals and mammoths are also the mammals using the rules given. Distinguishing 
between referential and ASCRIPTIVE prevents this kind of problem from occurring. We 
compare their results to ours quantitatively in Section [f| 



3 Determination of noun phrase reference 

All proper nouns are, by definition, REFERENTIAL. The algorithm used to determine the 
referential property of noun phrases headed by common nouns is shown in Figure | The 
algorithm presented is based on single sentences, it does not address the considerable problems 
of using information from outside the sentence being considered^. 

It is possible for the algorithm to be applied to the Japanese parse tree as part of the se- 
mantic analysis^. In ALT-J/E, however, the algorithm is applied after the semantic analysis 
has finished, during the transfer stage, because much of the semantic information is stored 
in the transfer dictionaries where the combination of Japanese and English makes it easy to 
disambiguate word senses. The overall process of translation in ALT-J/E is divided into 
seven parts. First, the system splits the Japanese text into morphemes and assigns parts of 
speech. Second, it parses the segmented text, often giving multiple possible interpretations. 
Third, it rewrites complicated Japanese expressions into simpler ones. Fourth, ALT-J/E 
semantically evaluates the various interpretations. Fifth, syntactic and semantic criteria are 
used to select the best interpretation. Sixth, the selected interpretation is transferred into 
English. Finally, the English sentence is adjusted to give the correct inflectional forms. The 
algorithm described in this section has been implemented as part of the sixth stage. However, 
it could be implemented as part of the fifth stage. 

Rules are applied in the order shown in Figure |], with later rules over-ruling earlier ones. 

The default assumption is that a noun phrase will be used to refer to some specific entity 
or entities in the discourse world, i.e. that it is referential. 

There are five rules that are applied at the sentence level, which use the meanings of verbs 
combined with the semantic categories of noun sg. These can all be overridden by subsequent 
rules. The subjects of verbs that predicate over an entire class, and the objects of verbs which 
predicate emotive action or emotive state, are generic. Verbs that trigger these rules, 



e.g. evolve, die out are marked in the lexicon (Bond et al. 1993). For copulas, the subject is 



generic if its semantic category is a descendent of the semantic category of the object, while 



2 Examples are given with the (romanized) Japanese original, a gloss and the human translation. The 
examples have been simplified to exemplify points more clearly; a new translation has been made for each 
simplified sentence. Japanese particles are glossed as follows: TOP for wa which marks the topic, OBJ for o 
which marks the object and GEN for no which shows a genitive relation. 

3 Algorithms to use contextual information from outside the sentence are currently being implemented. 

4 For information retrieval it is obviously essential to determine the referentiality of noun phrases as part of 
the source language analysis. 

5 The meanings of nouns are given in terms of a semantic hierarchy of 2,800 nodes. Each node is called a 
semantic category. Edges in the hierarchy represent IS-A r elationships, so tha t the child of a semantic category 



IS-A instance of it. For example, ORGAN IS-A body-part (Ogura et al. 1993) 
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1. The default is referential 

2. Sentence level rules 

(a) the subject of a verb marked in the lexicon as predicating over an entire class is generic: 
manmosu-wa zetsumetsu-shita ' Mammoths died out' 

(b) if the semantic category of the subject of a copula is a descendant of the semantic category 
of the object then the subject is generic: 

manmosu-wa dobutsu-da ' Mammoths are animals' 

(c) the object of a verb which predicates emotive action or emotive state is generic: 
watashi-wa manmosu-wo suki-da 'I like mammoths ' 

(d) the complement of a copula is ASCRIPTIVE: 
manmosu-wa dobutsu-da 'Mammoths are animals ' 

(e) appositive noun phrases are ASCRIPTIVE: 
denwagaisha-no NTT 'NTT, a telephone company' 

3. Modification by embedded sentences 

(a) A noun phrase whose head is modified by a tensed relative clause is referential: 
kinou kita otoko 'the man who came yesterday' 

4. Post-modification by setsubiji 'suffixes' and joshi-sotogo 'pseudo-particles' 

(a) the modificant of muke 'aimed at', yd 'for' ... is GENERIC: 
josei-muke-no zasshi 'A magazine for women ' 

(b) the modificant of -to-iu-no-wa 'things called' is GENERIC: 

kikai hon'yaku-to-iu-no-wa muzukashii ' Machine translation is difficult' 

5. Modification by demonstratives, numerals and the genitive construction no 'of 

(a) A noun phrase whose head is modified by a demonstrative or numeral is referential: 
kono otoko ' this man ', futari-no otoko ' two men ' 

(b) A noun phrase whose head is modified by the genitive construction is referential: 
hana-no saki 'the tip of my nose' 

6. A noun phrase with a 'unique' referent is referential: 
chikyu ' the earth ' 

Figure 1: Determination of noun phrase referentiality 
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it's complement is taken to be ASCRIPTIVE by defaultQ Finally, appositive noun phrases will 
be judged to be ASCRIPTIVE, as though they were the complement of a copula. 

Recall that these rules are only applied if the noun phrase in question is headed by a 
common noun. In sentence [l], the semantic category of meeting place is actual place, 
which is a child of the semantic category of Aoi hall public place. Aoi hall, however, is a 
proper noun so the rule is not applied. 

(Q) Jap: kaijo-wa Aoi-kaikan (j). 

Gloss: meeting place-TOP Aoi hall is 
Eng: The meeting place is the Aoi Hall 

The next level of rules (level |||) applies to noun phrases modified by embedded sentences. 
Japanese makes no phonological, morphological, or syntactic distinctions between restrictive 



and non-restrictive relative clauses ( Kuno 1973:235 ). This algorithm uses a simple heuristic: 



a noun phrase modified by a tensed embedded sentence is referential. 

The next level of rules (level 01) is based on post-modification in the Japanese sentence. 
The use of some setsubiji 'suffixesy implies that their modificant is generic. For example 
muke 'aimed at' in josei-muke-no-zasshi 'woman aimed-at GEN magazine' a magazine aimed at 
women. Similarly the construction A-to-iu-no-wa 'things called A' implies that its modificant 
is generic. It can in fact be thought of as a pseudo-particle, the whole construction acting 
as a single marker which has the effect of marking it's modificant as being a generic noun 
phrase used as the topicP]. 

The next level of rules (level [5J) makes a noun phrase whose head is modified by a demon- 
strative, numeral or the genitive construction NP-no 'NP's' referential. Note that only 
noun phrases modified by no judged to be genitive are referential. Partitive constructions 
such as okami-no-mure 'pack of wolf a pack of wolves are not included in this judgment. The 
genitive construction may be translated into English in a variety of ways including a prepo- 
sitional phrase headed by 'of, a possessive phrase with a clitic in the determiner position, or 
a possessive pronoun. 

Finally (level ||), noun phrases headed by nouns that are marked in the lexicon as likely 
to have a unique referent, such as chikyu 'the earth' are assumed to be referential. 

The algorithm presented in this section is only heuristic. Further work remains to be done 
to refine it. In particular: using the wa/ga distinction in conjunction with noun anaphora 
relations to distinguish between generic and referential, and improving the rules at 
level for relative clauses. 



6 If the complement is later judged to be referential by a subsequent rule it is equivalent to judging that 
the copula has been used equatively. 

7 setsubiji are a Japanese part of speech made up of suffixes that cannot stand alone, but change the meaning 
of the word they modify. 

8 In ALT-J/E the entire construction (and the similar construction A-to-iu-mono-wa 'things called A') 
is rewritten during the Japanese rewriting stage into a pseudo-particle ( Shirai et al. 199^ ), which marks its 
modificant a s being a generic noun ph rase in the ha-case (topic). It is not however necessary to do this, 



as shown in 



dependency structure. 



Murata and Nagao (1993), where this construction is found by matching against the Japanese 
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4 Using noun phrase referent iality to select articles and de- 
termine number 



Knowledge of a noun phrase's referential use is essential when translating from Japanese to 
English, as it plays a large part in determining how a noun phrase is expressed in English. In 
this section we show how articles and number are generated differently for the three different 
referentialities in the machine translation system ALT-J/E. Correct generation of articles 
and number is important not only to express meaning accurately, but because it is one of the 
major factors in determining the readability of Japanese-to-English translations. 

4.1 Translation of generic noun phrases 

A generic noun phrase (with a countable head noun) can generally be expressed in three ways 
( Huddleston 1984|) . We call these gen 'a', where the noun phrase is indefinite: A mammoth 
is a mammal; GEN 'the', where the noun phrase is definite: The mammoth is a mammal; 
and GEN (ft, where there is no article: Mammoths are mammals. Uncountable nouns and 
pluralia tantum can only be expressed by gen (ft (eg: Furniture is expensive). They cannot 
take gen 'a' and they do not take GEN 'the', because then the noun phrase would normally be 
interpreted as having definite reference. Nouns that can be either countable or uncountable 
take only GEN (ft or 'a': Cake is delicious/ Cakes are delicious, A cake is a kind of food. These 
combinations are shown in Table ffl. Noun phrases that cannot be used to show generic 
reference are marked with an asterisk (*). 



Table 1: Genericness and Countability 



GEN 
type 


Noun Countability Preference 
Countable Both Uncountable 


'a' 
'the' 

4> 


a mammoth a cake *a furniture 
the mammoth *the cake *the furniture 
mammoths cake/cakes furniture 



The use of all three kinds of generic noun phrases is not acceptable in some contexts, 
for example * a mammoth evolved. Sometimes a noun phrase can be ambiguous, for example 
/ like the elephant, where the speaker could like a particular elephant, or all elephants. 

Because the use of GEN (ft is acceptable in all contexts, ALT-J/E generates all generic 
noun phrases as such, that is as bare noun phrases. The number of the noun phrase depends 
on the countability preference of the noun phrase heading it and there will be no article. 



4.2 Translation of referential noun phrases 



The countability and number of referential noun phrases can be determined with heuristics 
that use information from the Japanese sentence along with knowledge of English countability 



stored in the lexicon. This is described in Bond et al. (1994) 



According to Quirk et al. (1985:265), for referential noun phrases: 
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The definite article the is used to mark the phrase it introduces as referring to 
something which can be identified uniquely in the contextual or general knowledge 
shared by speaker and hearer. 

Whether or not a referential noun phrase is definite or not is determined using heuristic 
criteria based on whether there is enough information to uniquely identify the noun phrase's 
referent, such as the following: 

• if the head noun is marked in the lexicon as being unique: 
the earth 

• if the noun phrase is made logically unique by a modifier: 
the best price 

• if the noun phrase's referent is restrictively described: 
the man who came to dinner, the aim of this research 

• direct and indirect anaphoric reference: 

/ saw a cat and a dog. The dog chased the cat . 

As the above criteria are only meaningful for referential noun phrases, it is essential 
to determine whether the noun phrase is referential as a first step. 

When it has been determined whether a noun phrase is definite or indefinite, then articles 
can be generated^. In the final stage of processing, if there is no determiner, definite noun 
phrases take the definite article the. Indefinite countable singular noun phrases will take the 
indefinite article a/an, while indefinite countable plural and uncountable noun phrases will 
take the zero article (p. This is summarized in Table [2|. 



Table 2: Generation of articles for referential noun phrases. 

Noun Phrase Number Definite Indefinite 

Countable singular the a/an 

Countable plural the <fi 

Uncountable the 4> 



4.3 Translation of ascriptive noun phrases 

The countability and number of ascriptive noun phrases matches that of their subject, and 
the countability and number of two appositive noun phrases match each other as described 
in Bond et al. (1994) , with the following proviso. If one element is plural and the other is a 
collective noun such as group, then they need not match. For example, many insects, a whole 
swarm, ... as opposed to many insects, bees I think, .... 

ALT-J/E makes the simplifying assumption that all ascriptive noun phrases are in- 
definite. Therefore, articles will be generated in the same way as for indefinite referential 
noun phrases. Countable singular noun phrases will therefore take the indefinite article a/an, 
and countable plural and uncountable noun phrases will take the zero article <f>. 
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As wel l as generating definite and indefinite articles, ALT-J/E also generates possessive pronouns (Bond 



et al. 1995) and some/any for referential noun phrases when appropriate. 
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5 Results 



The processing described above has been implemented in ALT-J/E. The rules were designed 
using data from a specially constructed set of test sentences collected by the authors. The 
algorithm was evaluated on a collection of newspaper articles from the Nikkei- Sangyou news- 
paper by an English native speaker not connected with the development of the algorithm. 
The results are summarized in Table [| 



Table 3: Correct Generation of Articles and Number 

Test Sentences Newspaper Articles 

NPs (240) Sentences (120) NPs (717) Sentences (102) 

Niw^ 94% 90% 77% 15% 

Old: 70% 46% 65% 5% 

New shows the results using the proposed method. 

Old shows the results using the unmodified system. 



We tested the system on newspaper articles, in the articles tested, there were an average 
of 7 noun phrases in each sentence. The articles were translated by ALT-J/E and the raw 
output examined by an English native speaker. Each noun phrase was given one of the 
following scores: 

STRUCTURE: problem with structure or choice of translation! 1 ^ 
best: the most appropriate article/number 
article: inappropriate article 
NUMBER: inappropriate number 

possessive: inappropriate use of possessive determiner 
COUNTABILITY: problem with countability 
reference: problem with referential property 

For the purpose of evaluating the generation of articles and number, noun phrases that were 
either the best possible translation, or that had a problem only with structure/choice 
of translation, were judged to be successful. A third-party evaluator gave the success 
rates as 77% for the system with the proposed method and 65% for the original system. The 
method of evaluation described above does not give a reproducible, absolute level of success. 
It does, however, successfully show the overall level of improvement/degradation, and help to 
identify the remaining problems. 

Our initial evaluation was done by the the authors, who found the success rates at the 
noun phrase level to be 92% for the proposed method and 76% for the system as it used 
to be. Nakazawa points out that this shows that the evaluation method is not reproducible 

10 This includes any major problems not connected with articles or number, such as outputing Japanese 
characters or spelling errors. 
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(personal communication May 1995). Because the goal is to produce a translation, which is 
new text, there is no objective target to compare the results with. This is a perennial problem 
for machine translation output. |Knight and Chancier () in a small pilot study showed that 
humans could replace articles (a/an and the) in an English text in which the articles had 
been replaced by blanks with an accuracy of around 95%. Raw machine translation output 
is less coherent than normal English text and so deciding which article is appropriate is an 
even harder task. 



6 Discussion 

In this section we discuss the remaining errors and compare the results to two other systems. 

168 of the 717 noun phrases in the machine translation of the newspaper articles had some 
problem. An brief analysis of the errors is given in Table 

Testing on the newspaper articles revealed one major heuristic that had been overlooked 
in the algorithm presented in section ||: some nouns when heading a construction such as 'N- 
o/-NP' carry an implication that the complement NP has GENERIC reference: for example, 
the applications of databases . This rule will be added to the algorithm at level [B], reducing the 
number of errors by around 8%. Apart from this there were no major changes that needed 
to be made to the algorithm. 

Overall, the largest sources of errors are problems with the source language analysis and 
dictionaries (22% each). These are not problems with the proposed algorithm but with the 
machine translation system as a whole. Another major source of errors is the translation of 
numerical expressions (12%). The processing for handling numerical expressions is currently 
being overhauled. The errors caused by lack of information in the dictionaries are solvable 
immediately, which will reduce the number of errors by around 20%. 

In the generation of articles and numbers for referential noun phrases some of the 
errors can simply be solved by the addition of new rules: for example, adding rules which use 
the meaning of adverbs to determine number or rules using pre-head modifiers to determine 
definiteness. The problems of common sense deduction and indirect anaphora, however, 
require a large scale knowledge base and inference rules. While both are being researched 
at the moment, they are unlikely to be implemented soon. We estimate that the number of 
errors caused by insufficiencies in the generation of articles and numbers for referential 
noun phrases can be reduced at least a quarter, thus reducing the total number of errors by 
around 8%. 

Combining the above figures, we predict it is possible to reduce the errors by around 
30%, bringing the total success rate to 84% for a window test. To go beyond this needs new 
processing to improve the source language analysis, the translation of numerical expressions 
and more use of contextual inferences. 

In addition examining even this small sample of text we came up with one major addition 
to the algorithm for determining noun phrase reference. Therefore the algorithm needs to be 
tested on a wider range of texts before the rules can be considered comprehensive. We have 
started testing the algorithm on a larger corpus of newspaper articles and are investigating 
methods for automatically learning rules. 



In Murata (1993) success rates of 68.9% for referential property and 85.6% for number 
were given for unknown texts of the same genre as that used in development of the rules. 
Their approach seems effective, although we predict the lack of a ASCRIPTIVE class will cause 
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Table 4: Errors in the generation of articles and number 



There were 168 errors in the 717 noun phrases 
that appeared in the machine translation of the newspaper articles 

Problem Area Freq. Description of error 



Analysis error 22% 

Dictionary errors 22% 

Numerical 12% 
Expressions 



The Japanese noun phrase was parsed incor- 
rectly so the rules did not trigger. 
The dictionary entry was incomplete. 
Complicated numerical expressions are 
translated badly: for ex- 

ample 384 Kbits of networks per second should 
be a 384 Kbit/s network 



Reference 



Reference 



5% 



There needs to be a rule to make database 
generic in expressions like: the strategic ap- 
plications of databases which is currently trans- 
lated as the strategic applications of a database 
Miscellaneous errors in determining noun phrase 
reference. 



Number 



Number 



In some cases rules using common sense and in- 
ference are needed to determine the number cor- 
rectly: for example sales counter should be plu- 
ral in the sales counter of telephone companies 
through out the country 

There are no rules to deduce number from in- 
formation given by adverbs: for example prices 
should be plural in The price is 5 yen and 15 

yen respectively 



Articles 



Articles 



Articles 



Articles 



7% The rules for deciding whether a noun has been 
restrictively described by an embedded sentence 
are too coarse. 

6% There needs to be a rule for indirect anaphora. 
two models should be definite in NTT intro- 
duced video-tel 111 and video-tel 222 in June. 
Two models are the first to have video receivers. 

3% There needs to be a rule to make a noun phrase 
definite if its pre-head modifier restricts it suf- 
ficiently: for example NTT will enter a video 
rental business 

4% Miscellaneous errors in determining whether a 
noun phrase is definite or not. 
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problems. It is impossible to directly compare our results as Murata's testing was all carried 
out in Japanese by the developers, so the problems of actually generating the English and 
getting an impartial evaluation were not addressed. Setting these considerations aside, when 
we separate our results for noun phrase reference (counting as failures noun phrases with errors 
in article use, noun phrase reference or the use of possessive determiners), and countability 
and number (counting as failures noun phrases with errors in number or countability), our 
proposed algorithm gave success rates of 74% and 85% respectively. 

Another approach is that of (Knight and Chander ), who proposed using an automated 
post-editor to correct articles. Their prototype has a success rate for learning to replace 
articles when they have been removed from English texts of 78%. At present however the 
prototype cannot be used to post-edit output from a typical machine translation system as 
it assumes the knowledge that an article should be used in a given position, which is not 
normally available, and that the generation rules can function using machine translation 
output, which has not been shown. 



7 Conclusion 

This paper proposes a method that uses the information available in a Japanese sentence to 
identify a noun phrase as being used either generically, referentially or ASCRIPTIVELY. 
This distinction is shown to be both theoretically justified and practically useful. The three 
way distinction in noun phrase reference is used as a base to determine a noun phrase's number 
and to generate appropriate articles and possessive pronouns when translating from Japanese 
to English. Incorporating this method into the machine translation system ALT- J /E helped 
to improve the percentage of noun phrases with correctly generated articles and number from 
65% to 77%. It is shown that the proposed method can be extended straightforwardly to 
increase the success rate to 84%. 

Several problems remain to be explored. We consider the following to of primary impor- 
tance: 

1. Extension of the algorithm to translate texts as coherent passages, not just as single 
sentences. 

2. Improvement of the reproducibility of the evaluation method. 

3. Investigation of the coverage of the algorithm on a wider collection of texts. 
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